Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 128
Filtrar
1.
IEEE Trans Biomed Eng ; PP2024 May 03.
Artigo em Inglês | MEDLINE | ID: mdl-38700959

RESUMO

OBJECTIVE: Early diagnosis of cardiovascular diseases is a crucial task in medical practice. With the application of computer audition in the healthcare field, artificial intelligence (AI) has been applied to clinical non-invasive intelligent auscultation of heart sounds to provide rapid and effective pre-screening. However, AI models generally require large amounts of data which may cause privacy issues. Unfortunately, it is difficult to collect large amounts of healthcare data from a single centre. METHODS: In this study, we propose federated learning (FL) optimisation strategies for the practical application in multi-centre institutional heart sound databases. The horizontal FL is mainly employed to tackle the privacy problem by aligning the feature spaces of FL participating institutions without information leakage. In addition, techniques based on deep learning have poor interpretability due to their "black-box" property, which limits the feasibility of AI in real medical data. To this end, vertical FL is utilised to address the issues of model interpretability and data scarcity. CONCLUSION: Experimental results demonstrate that, the proposed FL framework can achieve good performance for heart sound abnormality detection by taking the personal privacy protection into account. Moreover, using the federated feature space is beneficial to balance the interpretability of the vertical FL and the privacy of the data. SIGNIFICANCE: This work realises the potential of FL from research to clinical practice, and is expected to have extensive application in the federated smart medical system.

2.
Artigo em Inglês | MEDLINE | ID: mdl-38696290

RESUMO

Due to the objectivity of emotional expression in the central nervous system, EEG-based emotion recognition can effectively reflect humans' internal emotional states. In recent years, convolutional neural networks (CNNs) and recurrent neural networks (RNNs) have made significant strides in extracting local features and temporal dependencies from EEG signals. However, CNNs ignore spatial distribution information from EEG electrodes; moreover, RNNs may encounter issues such as exploding/vanishing gradients and high time consumption. To address these limitations, we propose an attention-based temporal graph representation network (ATGRNet) for EEG-based emotion recognition. Firstly, a hierarchical attention mechanism is introduced to integrate feature representations from both frequency bands and channels ordered by priority in EEG signals. Second, a graph convolutional neural network with top-k operation is utilized to capture internal relationships between EEG electrodes under different emotion patterns. Next, a residual-based graph readout mechanism is applied to accumulate the EEG feature node-level representations into graph-level representations. Finally, the obtained graph-level representations are fed into a temporal convolutional network (TCN) to extract the temporal dependencies between EEG frames. We evaluated our proposed ATGRNet on the SEED, DEAP and FACED datasets. The experimental findings show that the proposed ATGRNet surpasses the state-of-the-art graph-based mehtods for EEG-based emotion recognition.

3.
Patterns (N Y) ; 5(3): 100932, 2024 Mar 08.
Artigo em Inglês | MEDLINE | ID: mdl-38487806

RESUMO

Along with propagating the input toward making a prediction, Bayesian neural networks also propagate uncertainty. This has the potential to guide the training process by rejecting predictions of low confidence, and recent variational Bayesian methods can do so without Monte Carlo sampling of weights. Here, we apply sample-free methods for wildlife call detection on recordings made via passive acoustic monitoring equipment in the animals' natural habitats. We further propose uncertainty-aware label smoothing, where the smoothing probability is dependent on sample-free predictive uncertainty, in order to downweigh data samples that should contribute less to the loss value. We introduce a bioacoustic dataset recorded in Malaysian Borneo, containing overlapping calls from 30 species. On that dataset, our proposed method achieves an absolute percentage improvement of around 1.5 points on area under the receiver operating characteristic (AU-ROC), 13 points in F1, and 19.5 points in expected calibration error (ECE) compared to the point-estimate network baseline averaged across all target classes.

4.
J Affect Disord ; 355: 40-49, 2024 Jun 15.
Artigo em Inglês | MEDLINE | ID: mdl-38552911

RESUMO

BACKGROUND: Prior research has associated spoken language use with depression, yet studies often involve small or non-clinical samples and face challenges in the manual transcription of speech. This paper aimed to automatically identify depression-related topics in speech recordings collected from clinical samples. METHODS: The data included 3919 English free-response speech recordings collected via smartphones from 265 participants with a depression history. We transcribed speech recordings via automatic speech recognition (Whisper tool, OpenAI) and identified principal topics from transcriptions using a deep learning topic model (BERTopic). To identify depression risk topics and understand the context, we compared participants' depression severity and behavioral (extracted from wearable devices) and linguistic (extracted from transcribed texts) characteristics across identified topics. RESULTS: From the 29 topics identified, we identified 6 risk topics for depression: 'No Expectations', 'Sleep', 'Mental Therapy', 'Haircut', 'Studying', and 'Coursework'. Participants mentioning depression risk topics exhibited higher sleep variability, later sleep onset, and fewer daily steps and used fewer words, more negative language, and fewer leisure-related words in their speech recordings. LIMITATIONS: Our findings were derived from a depressed cohort with a specific speech task, potentially limiting the generalizability to non-clinical populations or other speech tasks. Additionally, some topics had small sample sizes, necessitating further validation in larger datasets. CONCLUSION: This study demonstrates that specific speech topics can indicate depression severity. The employed data-driven workflow provides a practical approach for analyzing large-scale speech data collected from real-world settings.


Assuntos
Aprendizado Profundo , Fala , Humanos , Smartphone , Depressão/diagnóstico , Interface para o Reconhecimento da Fala
5.
Patterns (N Y) ; 5(3): 100952, 2024 Mar 08.
Artigo em Inglês | MEDLINE | ID: mdl-38487807

RESUMO

In their recent publication in Patterns, the authors proposed a methodology based on sample-free Bayesian neural networks and label smoothing to improve both predictive and calibration performance on animal call detection. Such approaches have the potential to foster trust in algorithmic decision making and enhance policy making in applications about conservation using recordings made by on-site passive acoustic monitoring equipment. This interview is a companion to these authors' recent paper, "Propagating Variational Model Uncertainty for Bioacoustic Call Label Smoothing".

6.
Cyborg Bionic Syst ; 5: 0075, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-38440319

RESUMO

Leveraging the power of artificial intelligence to facilitate an automatic analysis and monitoring of heart sounds has increasingly attracted tremendous efforts in the past decade. Nevertheless, lacking on standard open-access database made it difficult to maintain a sustainable and comparable research before the first release of the PhysioNet CinC Challenge Dataset. However, inconsistent standards on data collection, annotation, and partition are still restraining a fair and efficient comparison between different works. To this line, we introduced and benchmarked a first version of the Heart Sounds Shenzhen (HSS) corpus. Motivated and inspired by the previous works based on HSS, we redefined the tasks and make a comprehensive investigation on shallow and deep models in this study. First, we segmented the heart sound recording into shorter recordings (10 s), which makes it more similar to the human auscultation case. Second, we redefined the classification tasks. Besides using the 3 class categories (normal, moderate, and mild/severe) adopted in HSS, we added a binary classification task in this study, i.e., normal and abnormal. In this work, we provided detailed benchmarks based on both the classic machine learning and the state-of-the-art deep learning technologies, which are reproducible by using open-source toolkits. Last but not least, we analyzed the feature contributions of best performance achieved by the benchmark to make the results more convincing and interpretable.

7.
Heliyon ; 10(1): e23142, 2024 Jan 15.
Artigo em Inglês | MEDLINE | ID: mdl-38163154

RESUMO

Among the 17 Sustainable Development Goals (SDGs) proposed within the 2030 Agenda and adopted by all the United Nations member states, the 13th SDG is a call for action to combat climate change. Moreover, SDGs 14 and 15 claim the protection and conservation of life below water and life on land, respectively. In this work, we provide a literature-founded overview of application areas, in which computer audition - a powerful but in this context so far hardly considered technology, combining audio signal processing and machine intelligence - is employed to monitor our ecosystem with the potential to identify ecologically critical processes or states. We distinguish between applications related to organisms, such as species richness analysis and plant health monitoring, and applications related to the environment, such as melting ice monitoring or wildfire detection. This work positions computer audition in relation to alternative approaches by discussing methodological strengths and limitations, as well as ethical aspects. We conclude with an urgent call to action to the research community for a greater involvement of audio intelligence methodology in future ecosystem monitoring approaches.

8.
Mult Scler ; 30(1): 103-112, 2024 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-38084497

RESUMO

INTRODUCTION: Multiple sclerosis (MS) is a leading cause of disability among young adults, but standard clinical scales may not accurately detect subtle changes in disability occurring between visits. This study aims to explore whether wearable device data provides more granular and objective measures of disability progression in MS. METHODS: Remote Assessment of Disease and Relapse in Central Nervous System Disorders (RADAR-CNS) is a longitudinal multicenter observational study in which 400 MS patients have been recruited since June 2018 and prospectively followed up for 24 months. Monitoring of patients included standard clinical visits with assessment of disability through use of the Expanded Disability Status Scale (EDSS), 6-minute walking test (6MWT) and timed 25-foot walk (T25FW), as well as remote monitoring through the use of a Fitbit. RESULTS: Among the 306 patients who completed the study (mean age, 45.6 years; females 67%), confirmed disability progression defined by the EDSS was observed in 74 patients, who had approximately 1392 fewer daily steps than patients without disability progression. However, the decrease in the number of steps experienced over time by patients with EDSS progression and stable patients was not significantly different. Similar results were obtained with disability progression defined by the 6MWT and the T25FW. CONCLUSION: The use of continuous activity monitoring holds great promise as a sensitive and ecologically valid measure of disability progression in MS.


Assuntos
Pessoas com Deficiência , Esclerose Múltipla , Dispositivos Eletrônicos Vestíveis , Feminino , Humanos , Masculino , Pessoa de Meia-Idade , Avaliação da Deficiência , Esclerose Múltipla/diagnóstico , Teste de Caminhada , Caminhada/fisiologia , Adulto
9.
IEEE Trans Pattern Anal Mach Intell ; 46(2): 805-822, 2024 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-37851557

RESUMO

Automatically recognising apparent emotions from face and voice is hard, in part because of various sources of uncertainty, including in the input data and the labels used in a machine learning framework. This paper introduces an uncertainty-aware multimodal fusion approach that quantifies modality-wise aleatoric or data uncertainty towards emotion prediction. We propose a novel fusion framework, in which latent distributions over unimodal temporal context are learned by constraining their variance. These variance constraints, Calibration and Ordinal Ranking, are designed such that the variance estimated for a modality can represent how informative the temporal context of that modality is w.r.t. emotion recognition. When well-calibrated, modality-wise uncertainty scores indicate how much their corresponding predictions are likely to differ from the ground truth labels. Well-ranked uncertainty scores allow the ordinal ranking of different frames across different modalities. To jointly impose both these constraints, we propose a softmax distributional matching loss. Our evaluation on AVEC 2019 CES, CMU-MOSEI, and IEMOCAP datasets shows that the proposed multimodal fusion method not only improves the generalisation performance of emotion recognition models and their predictive uncertainty estimates, but also makes the models robust to novel noise patterns encountered at test time.

10.
Artigo em Inglês | MEDLINE | ID: mdl-38082880

RESUMO

The manipulation and stimulation of cell growth is invaluable for neuroscience research such as brain-machine interfaces or applications of neural tissue engineering. For the implementation of such research avenues, in particular the analysis of cells' migration behaviour, and accordingly, the determination of cell positions on microscope images is essential, causing a current need for labour-intensive, manual annotation efforts of the cell positions. In an attempt towards automation of the required annotation efforts, we i) introduce NeuroCellCentreDB, a novel dataset of neuron-like cells on microscope images with annotated cell centres, ii) evaluate a common (bounding box-based) object detector, faster region-based convolutional neural network (FRCNN), for the task at hand, and iii) design and test a fully convolutional neural network, with the specific goal of cell centre detection. We achieve an F1 score of up to 0.766 on the test data with a tolerance radius of 16 pixels. Our code and dataset are publicly available.


Assuntos
Microscopia , Redes Neurais de Computação , Automação , Proliferação de Células , Neurônios
11.
Artigo em Inglês | MEDLINE | ID: mdl-38083041

RESUMO

As the speech production mechanism is related to the breathing process, speech signals and breathing patterns impact each other. Breathing patterns are the physiological signals which help in understanding the psychological, physiological and cognitive states of an individual. Capturing such patterns relies on the availability of equipment such as respiratory belts, which are costly and uncomfortable to wear for long duration. In this paper, we attempt to extract the breathing patterns from speech signals, which are easily available and can be recorded using a smartphone's microphone. In the presented work, simultaneous speech and breath signals are captured from 100 Indians of the age group 20 to 25 years while they read a phonetically balanced passage in English language. We have identified five distinct breathing templates; following two broad speech-breath categories, exhibited by the speakers while they read the same passage. For one of the two categories, the time domain features with regression network can extract the breathing patterns from speech with a Pearson correlation coefficient of 0.70. By computational modelling, we distinguish these two breathing categories from speech with a classification accuracy of 79%.


Assuntos
Leitura , Fala , Humanos , Adulto Jovem , Adulto , Fala/fisiologia , Respiração , Fatores de Tempo , Idioma
12.
Artigo em Inglês | MEDLINE | ID: mdl-38083138

RESUMO

In the presented work, we utilise a noisy dataset of clinical interviews with depression patients conducted over the telephone for the purpose of depression classification and automated detection of treatment response. Compared to most previous studies dealing with depression recognition from speech, our data set does not include a healthy group of subjects that have never been diagnosed with depression. Furthermore, it contains measurements at different time points for individual subjects, making it suitable for machine learning-based detection of treatment response. In our experiments, we make use of an unsupervised feature quantisation and aggregation method achieving 69.2% Unweighted Average Recall (UAR) when classifying whether patients are currently in remission or experiencing a major depressive episode (MDE). The performance of our model matches cutoff-based classification via Hamilton Rating Scale for Depression (HRSD) scores. Finally, we show that using speech samples, we can detect response to treatment with a UAR of 68.1%.


Assuntos
Transtorno Depressivo Maior , Humanos , Transtorno Depressivo Maior/diagnóstico , Transtorno Depressivo Maior/terapia , Depressão/diagnóstico , Depressão/terapia , Fala , Reconhecimento Psicológico , Nível de Saúde
13.
Artigo em Inglês | MEDLINE | ID: mdl-38083221

RESUMO

According to the WHO, approximately one in six individuals worldwide will develop some form of cancer in their lifetime. Therefore, accurate and early detection of lesions is crucial for improving the probability of successful treatment, reducing the need for more invasive treatments, and leading to higher rates of survival. In this work, we propose a novel R-CNN approach with pretraining and data augmentation for universal lesion detection. In particular, we incorporate an asymmetric 3D context fusion (A3D) for feature extraction from 2D CT images with Hybrid Task Cascade. By doing so, we supply the network with further spatial context, refining the mask prediction over several stages and making it easier to distinguish hard foregrounds from cluttered backgrounds. Moreover, we introduce a new video pretraining method for medical imaging by using consecutive frames from the YouTube VOS video segmentation dataset which improves our model's sensitivity by 0.8 percentage points at a false positive rate of one false positive per image. Finally, we apply data augmentation techniques and analyse their impact on the overall performance of our models at various false positive rates. Using our introduced approach, it is possible to increase the A3D baseline's sensitivity by 1.04 percentage points in mFROC.

14.
Artigo em Inglês | MEDLINE | ID: mdl-38082647

RESUMO

With the depressive psychiatric disorders becoming more common, people are gradually starting to take it seriously. Somatisation disorders, as a general mental disorder, are rarely accurately identified in clinical diagnosis for its specific nature. In the previous work, speech recognition technology has been successfully applied to the task of identifying somatisation disorders on the Shenzhen Somatisation Speech Corpus. Nevertheless, there is still a scarcity of labels for somatisation disorder speech database. The current mainstream approaches in the speech recognition heavily rely on the well labelled data. Compared to supervised learning, self-supervised learning is able to achieve the same or even better recognition results while reducing the reliance on labelled samples. Moreover, self-supervised learning can generate general representations without the need for human hand-crafted features depending on the different recognition tasks. To this end, we apply self-supervised learning pre-trained models to solve few-labelled somatisation disorder speech recognition. In this study, we compare and analyse the results of three self-supervised learning models (contrastive predictive coding, wav2vec and wav2vec 2.0). The best result of wav2vec 2.0 model achieves 77.0 % unweighted average recall and is significantly better than CPC (p < .005), performing better than the benchmark of the supervised learning model.Clinical relevance- This work proposed a self-supervised learning model to resolve the few-labelled SD speech data, which can be well used for helping psychiatrists with clinical assistant to diagnosis. With this model, psychiatrists no longer need to spend a lot of time labelling SD speech data.


Assuntos
Distúrbios da Fala , Fala , Humanos , Benchmarking , Bases de Dados Factuais , Aprendizado de Máquina Supervisionado
15.
Artigo em Inglês | MEDLINE | ID: mdl-38082715

RESUMO

Deep neural networks with attention mechanism have shown promising results in many computer vision and medical image processing applications. Attention mechanisms help to capture long range interactions. Recently, more sophisticated attention mechanisms like criss-cross attention have been proposed for efficient computation of attention blocks. In this paper, we introduce a simple and low-overhead approach of adding noise to the attention block which we discover to be very effective when using an attention mechanism. Our proposed methodology of introducing regularisation in the attention block by adding noise makes the network more robust and resilient, especially in scenarios where there is limited training data. We incorporate this regularisation mechanism in the criss-cross attention block. This criss-cross attention block enhanced with regularisation is integrated in the bottleneck layer of a U-Net for the task of medical image segmentation. We evaluate our proposed framework on a challenging subset of the NIH dataset for segmenting lung lobes. Our proposed methodology results in improving dice-scores by 2.5 % in this context of medical image segmentation.


Assuntos
Processamento de Imagem Assistida por Computador , Redes Neurais de Computação
16.
Artigo em Inglês | MEDLINE | ID: mdl-38083307

RESUMO

Cardiovascular diseases (CVDs) are the leading cause of death globally. Heart sound signal analysis plays an important role in clinical detection and physical examination of CVDs. In recent years, auxiliary diagnosis technology of CVDs based on the detection of heart sound signals has become a research hotspot. The detection of abnormal heart sounds can provide important clinical information to help doctors diagnose and treat heart disease. We propose a new set of fractal features - fractal dimension (FD) - as the representation for classification and a Support Vector Machine (SVM) as the classification model. The whole process of the method includes cutting heart sounds, feature extraction, and classification of abnormal heart sounds. We compare the classification results of the heart sound waveform (time domain) and the spectrum (frequency domain) based on fractal features. Finally, according to the better classification results, we choose the fractal features that are most conducive for classification to obtain better classification performance. The features we propose outperform the widely used features significantly (p < .05 by one-tailed z-test) with a much lower dimension.Clinical relevance-The heart sound classification model based on fractal provides a new time-frequency analysis method for heart sound signals. A new effective mechanism is proposed to explore the relationship between the heart sound acoustic properties and the pathology of CVDs. As a non-invasive diagnostic method, this work could supply an idea for the preliminary screening of cardiac abnormalities through heart sounds.


Assuntos
Doenças Cardiovasculares , Cardiopatias , Ruídos Cardíacos , Humanos , Fractais , Auscultação Cardíaca
17.
Artigo em Inglês | MEDLINE | ID: mdl-38083410

RESUMO

Human behavior expressions such as of confidence are time-varying entities. Both vocal and facial cues that convey the human confidence expressions keep varying throughout the duration of analysis. Although, the cues from these two modalities are not always in synchrony, they impact each other and the fused outcome as well. In this paper, we present a deep fusion technique to combine the two modalities and derive a single outcome to infer human confidence. Fused outcome improves the classification performance by capturing the temporal information from both the modalities. The analysis of time-varying nature of expressions in the conversations captured in an interview setup is also presented. We collected data from 51 speakers who participated in interview sessions. The average area under the curve (AUC) of uni-modal models using speech and facial expressions is 70.6% and 69.4%, respectively, for classifying confident videos from non-confident ones in 5-fold cross-validation analysis. Our deep fusion model improves the performance giving an average AUC of 76.8%.


Assuntos
Percepção da Fala , Voz , Humanos , Fala , Comunicação , Processos Mentais
18.
Artigo em Inglês | MEDLINE | ID: mdl-38083586

RESUMO

Cardiovascular diseases (CVDs) are the number one cause of death worldwide. In recent years, intelligent auxiliary diagnosis of CVDs based on computer audition has become a popular research field, and intelligent diagnosis technology is increasingly mature. Neural networks used to monitor CVDs are becoming more complex, requiring more computing power and memory, and are difficult to deploy in wearable devices. This paper proposes a lightweight model for classifying heart sounds based on knowledge distillation, which can be deployed in wearable devices to monitor the heart sounds of wearers. The network model is designed based on Convolutional Neural Networks (CNNs). Model performance is evaluated by extracting Mel Frequency Cepstral Coefficients (MFCCs) features from the PhysioNet/CinC Challenge 2016 dataset. The experimental results show that knowledge distillation can improve a lightweight network's accuracy, and our model performs well on the test set. Especially, when the knowledge distillation temperature is 7 and the weight α is 0.1, the accuracy is 88.5 %, the recall is 83.8 %, and the specificity is 93.6 %.Clinical relevance- A lightweight model of heart sound classification based on knowledge distillation can be deployed on various hardware devices for timely monitoring and feedback of the physical condition of patients with CVDs for timely provision of medical advice. When the model is deployed on the medical instruments of the hospital, the condition of severe and hospitalised patients can be timely fed back and clinical treatment advice can be provided to the clinicians.


Assuntos
Doenças Cardiovasculares , Aprendizado Profundo , Ruídos Cardíacos , Dispositivos Eletrônicos Vestíveis , Humanos , Redes Neurais de Computação
19.
Artigo em Inglês | MEDLINE | ID: mdl-38083758

RESUMO

Music can effectively induce specific emotion and usually be used in clinical treatment or intervention. The electroencephalogram can help reflect the impact of music. Previous studies showed that the existing methods achieved relatively good performance in predicting emotion response to music. However, these methods tend to be time consuming and expensive due to their complexity. To this end, this study proposes a grey wolf optimiser-based method to predict the induced emotion through fusing electroencephalogram features and music features. Experimental results show that, the proposed method can reach a promising performance for predicting emotional response to music and outperform the alternative method. In addition, we analyse the relationship between the music features and electroencephalogram features and the results demonstrate that, musical timbre features are significantly related to the electroencephalogram features.Clinical relevance- This study targets the automatic prediction of the human response to music. It further explores the correlation between EEG features and music features aiming to provide the basis for the extension to the application of music. The grey wolf optimiser-based method proposed in this study could supply a promising avenue for the emotion prediction as induced by music.


Assuntos
Música , Lobos , Humanos , Animais , Música/psicologia , Projetos Piloto , Encéfalo/fisiologia , Eletroencefalografia/métodos
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...